Prediction Using Note Text: Synthetic Feature Creation with word2vec

نویسندگان

Manuel Amunategui

Tristan Markwell

Yelena Rozenfeld

چکیده

word2vec affords a simple yet powerful approach of extracting quantitative variables from unstructured textual data. Over half of healthcare data is unstructured (1) and therefore hard to model without involved expertise in data engineering and natural language processing. word2vec can serve as a bridge to quickly gather intelligence from such data sources. In this study, we ran 650 megabytes of unstructured, medical chart notes from the Providence Health & Services electronic medical record through word2vec. We used two different approaches in creating predictive variables and tested them on the risk of readmission for patients with COPD (Chronic Obstructive Lung Disease). As a comparative benchmark, we ran the same test using the LACE risk model (2) (a single score based on length of stay, acuity, comorbid conditions, and emergency department visits). Using only free text and mathematical might, we found word2vec comparable to LACE in predicting the risk of readmission of COPD patients.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...

متن کامل

Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System

In this work, we have proposed an automatic discourse prediction model. It predicts the discourse information for a sentence. In this study, three discourse modes considered are descriptive, narrative and dialogue. The proposed model is developed using story corpus. The story corpus comprises of audio and its corresponding text transcription of short children stories. The development of this mo...

متن کامل

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For crossmodal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutio...

متن کامل

Learning Stylometric Representations for Authorship Analysis

Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author’s identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1503.05123 شماره

صفحات -

تاریخ انتشار 2015

Prediction Using Note Text: Synthetic Feature Creation with word2vec

نویسندگان

چکیده

منابع مشابه

A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Learning Stylometric Representations for Authorship Analysis

عنوان ژورنال:

اشتراک گذاری